Metric learning for unsupervised phoneme segmentation
نویسندگان
چکیده
Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean distance yields the best metric to estimate the goodness of segmentations. In this paper, we study how to learn a good metric to improve the performance of segmentation. We propose two criteria for learning metric: Minimum of Summation Variance (MSV) and Maximum of Discrimination Variance (MDV). The experimental results on TIMIT database indicate that the use of learning metric can achieve better segmentation performances. The best recall rate of this paper is 81.8% (20ms windows), compared to 77.5% of [1]. We also introduce an iterative algorithm to learn metric without using labeled data, which achieves similar results as those with labeled data.
منابع مشابه
Unsupervised Phoneme Segmentation Using Mahalanobis Distance
Abstract One of the fundamental problems in speech engineering is phoneme segmentation. Approaches to phoneme segmentation can be divided into two categories: supervised and unsupervised segmentation. The approach of this paper belongs to the 2nd category, which tries to perform phonetic segmentation without using any prior knowledge on linguistic contents and acoustic models. In an earlier wor...
متن کاملUnsupervised Phoneme Segmentation Using Transformed Cepstrum Features
One of the basic problems in speech engineering is phoneme segmentation, that is, to divide a speech stream into a string of phonemes. Automatic Speech Recognition (ASR) models often require reliable phoneme segmentation in the initial training phase, and Text-to-Speech (TTS) systems need a large speech database with correct phoneme segmentation information for improving the performance. Human ...
متن کاملA Language-Independent Unsupervised Model for Morphological Segmentation
Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological...
متن کاملA neural network model of lexical segmentation and recognition
A neural network models is presented for generating a representation of words from the input phoneme sequences. It uses an unsupervised learning algorithm that compares the current input with its memory of previous sequences, and generates a new representation of the common subsequence. Although the generated representation is quite noisy, the model can extract the consistent pairs of subsequen...
متن کاملLarge-Margin Metric Learning for Partitioning Problems
In this paper, we consider unsupervised partitioning problems, such as clustering, image segmentation, video segmentation and other change-point detection problems. We focus on partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, which include mean-based change-point detection, K-means, spectral clustering and normalized cuts. Our main goal is to le...
متن کامل